Document supported `databricks_retry_args` usage for deferrable Databricks operators by kosiew · Pull Request #68017 · apache/airflow

kosiew · 2026-06-04T12:04:17Z

Part of Callable deserialization causing error with DatabricksExecutionTrigger #64609

What does this PR do?

Follow up for #64960

Adds documentation clarifying which databricks_retry_args configurations are supported when using Databricks operators in deferrable mode.

The new documentation:

Explains that databricks_retry_args must be serialization-safe because it is serialized across the trigger boundary when deferrable=True.
Documents supported value types (plain Python primitives and collections of primitives).
Provides examples of supported configurations, such as {"reraise": True}.
Explains that retry count and delay should be configured through the dedicated retry_limit and retry_delay operator parameters.
Documents unsupported configurations, including Tenacity strategy objects (stop_after_attempt, wait_incrementing, etc.) and arbitrary callables.
Recommends using the non-deferrable Databricks operators when custom callable retry strategies are required.
Adds a Databricks provider changelog entry describing the documentation update.

Does this PR introduce any user-facing change?

Yes. Documentation now explicitly describes the serialization requirements and supported shapes for databricks_retry_args in deferrable Databricks operators, helping users avoid unsupported retry configurations.

How was this patch tested?

No tests were added or modified. This PR contains documentation and changelog updates only.

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)
ChatGPT

…ferrable operators Add "Retry args in deferrable mode" subsection under DatabricksSubmitRunDeferrableOperator and DatabricksRunNowDeferrableOperator explaining: - Serialization requirement: only plain Python primitives allowed across the trigger boundary - Supported shapes (int/float primitives, nested plain-dict) - Unsupported shapes (Tenacity objects, callables) with note that a ValueError is raised at task submission - Recommended workaround: use non-deferrable mode for custom retry strategies Also update changelog for 7.16.0.

eladkal · 2026-07-01T06:35:29Z

cc @moomindani for review

moomindani

Thanks for the follow-up — this is exactly the docs fix scoped at the end of the #64960 thread, and the structure of the two new sections is clean (heading levels, anchors, and code blocks all check out). However, two of the content claims do not match what the merged code actually does, and two files should not be in the diff:

The "Supported (serialization-safe and runtime-valid)" example {"reraise": True} is not runtime-valid: BaseDatabricksHook replaces (does not merge) the default retry_args, so stop/wait are lost and tenacity falls back to stop_never + wait_none() — an infinite, zero-backoff retry loop. Setting databricks_retry_args at all also makes retry_limit/retry_delay no-ops. Since tenacity strategy objects (the only way to set stop/wait inside retry_args) are exactly what deferrable mode rejects, the honest guidance is "do not use databricks_retry_args in deferrable mode; use retry_limit/retry_delay", rather than a supported example.
"will raise ValueError at task submission" overstates the timing: validation runs in the trigger constructors at defer time, after submit_run/run_now has already launched the Databricks run. (The #64960 thread did say "before any Databricks API call", but re-checking the merged code, the operator path submits first and defers second — operators/databricks.py:834-836.)
generated/provider_dependencies.json.sha256sum is a stale-base rebase artifact — please rebase onto current main and drop it.
The hand-added changelog entry should be removed (release-manager-maintained file; see inline).

Details inline.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

moomindani · 2026-07-02T02:04:48Z

@@ -1 +1 @@
-2d6f34bb40832f84cb6c121237b1c5b0a05181dccface9fd171558f4df1747dc
+2ccde55d75b93c7fc2c5723fc7f74bf8995244606190c98acf005ea1f39f04ca


This file was intentionally deleted and gitignored on main by #68801 (it is auto-regenerated by breeze when needed), so this PR re-introduces a file that no longer exists upstream — and the hash edit itself is byte-for-byte identical to the already-merged #68011. It looks like the branch was cut in the window between #67080 (which mistakenly re-added the file) and #68011. Could you rebase onto current main and drop this file from the diff? This is the same stale-base issue flagged during the #64960 review round.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

moomindani · 2026-07-02T02:04:48Z

+
+When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the
+trigger boundary and must contain only Airflow-serializable values (plain Python primitives
+such as ``int``, ``float``, ``str``, ``bool``, ``None``, ``dict``, and ``list``).


Two precision issues here:

The boundary is Airflow serde, not plain primitives. Databricks: Fail fast for non-serializable retry_args in deferrable operators and triggers #64960 deliberately switched the validator from json.dumps back to airflow.sdk.serde.serialize precisely because serde accepts more than primitives — the merged tests cover a datetime value as supported. Documenting "plain Python primitives" re-narrows what Databricks: Fail fast for non-serializable retry_args in deferrable operators and triggers #64960 resolved to keep broad; "Airflow-serde-serializable values" (with primitives as the common case) would be accurate.

Serializability is necessary but not sufficient: the validator checks values only, not whether the keys are valid tenacity.Retrying kwargs. For example {"stop": 3} passes validation, then fails at runtime with TypeError: 'int' object is not callable when tenacity invokes stop(retry_state). Worth stating that passing validation does not mean the retry config is meaningful.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

moomindani · 2026-07-02T02:04:48Z

+.. code-block:: python
+
+    # Only plain-primitive Retrying kwarg: reraise
+    databricks_retry_args = {"reraise": True}


This example is serialization-safe but not runtime-valid as documented. When retry_args is provided, BaseDatabricksHook.__init__ replaces the defaults rather than merging (hooks/databricks_base.py:151-161 — only retry and after are re-injected), so stop and wait are gone and tenacity falls back to stop_never + wait_none(). On a persistently retryable API error this retries forever with zero backoff. It also contradicts the paragraph below: once databricks_retry_args is set, retry_limit/retry_delay are ignored entirely (they are only consulted in the else branch), so the two cannot be combined. Given deferrable mode rejects the tenacity objects that would restore stop/wait, I would replace the "Supported" example with explicit guidance: in deferrable mode, leave databricks_retry_args unset and use retry_limit/retry_delay.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

moomindani · 2026-07-02T02:04:48Z

+``wait``, ``retry``, ``before``, ``after``, etc.) require tenacity
+callable objects, which are not serialization-safe in deferrable mode.
+
+**Not supported** in deferrable mode (will raise ``ValueError`` at task submission):


"At task submission" overstates the timing. The validation added by #64960 lives only in the trigger constructors (triggers/databricks.py:61,160), and the operator defers after submitting: submit_run/run_now runs first, then _handle_deferrable_databricks_operator_execution builds the trigger (operators/databricks.py:834-836, 1183-1185). So with invalid retry_args the Databricks run is already launched, and the task fails at defer time, leaving that run in flight. (And if the run is already terminal on the first poll, the trigger is never built and no error is raised at all.) Suggest: "will raise ValueError when the task defers — note the Databricks run has already been submitted at that point".

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

moomindani · 2026-07-02T02:04:48Z

+    databricks_retry_args = {"retry": my_custom_retry_callable}
+
+If you need a custom callable retry strategy, use the non-deferrable
+:class:`~airflow.providers.databricks.operators.DatabricksRunNowOperator` (``deferrable=False``).


The cross-reference path is missing the module segment: the class lives at airflow.providers.databricks.operators.databricks.DatabricksRunNowOperator, and operators/__init__.py has no re-exports, so this xref will not resolve. Sibling pages (sql_statements.rst, notebook.rst, task.rst) use the full path. The same short form already exists in the pre-existing intro lines of these two pages — since this PR touches both files, it would be good to fix those occurrences too.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

moomindani · 2026-07-02T02:04:48Z

+Retry args in deferrable mode
+-----------------------------
+
+When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the


Same comments as the equivalent section in run_now.rst apply to this copy: the {"reraise": True} example, the "plain primitives" framing, the "at task submission" timing, and the :class: path.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

moomindani · 2026-07-02T02:04:48Z

 ~~~~~~~~

 * ``Fail fast for non-serializable retry_args in deferrable operators and triggers (#64960)``
+* ``Document supported retry_args shapes for deferrable Databricks operators``


Please drop this entry. Per the NOTE TO CONTRIBUTORS at the top of this file, the changelog is maintained semi-automatically by the release manager, and contributor edits are only for breaking-change guidance. A hand-added line (without the (#NNNNN) suffix, and under "Features" for a doc-only change) will conflict with the generated entries at release time — the commit message alone is sufficient.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

moomindani · 2026-07-02T04:36:22Z

Small correction to one point in my review: generated/provider_dependencies.json.sha256sum was re-added to main by #68775 a few hours before I posted (the same accidental re-add pattern as #67080, after #68801 had removed it), so "a file that no longer exists upstream" is no longer accurate. The requested action is unchanged: rebase onto current main and keep this file out of this PR's diff.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

kosiew added 2 commits June 4, 2026 18:20

Fix Databricks deferrable retry docs and validation

283df3e

kosiew force-pushed the callable-deserialization-02-64609 branch from 8b7caa8 to 283df3e Compare June 4, 2026 14:14

moomindani suggested changes Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document supported `databricks_retry_args` usage for deferrable Databricks operators#68017

Document supported `databricks_retry_args` usage for deferrable Databricks operators#68017
kosiew wants to merge 2 commits into
apache:mainfrom
kosiew:callable-deserialization-02-64609

kosiew commented Jun 4, 2026

Uh oh!

eladkal commented Jul 1, 2026

Uh oh!

moomindani left a comment

Uh oh!

moomindani Jul 2, 2026

Uh oh!

moomindani Jul 2, 2026

Uh oh!

moomindani Jul 2, 2026

Uh oh!

moomindani Jul 2, 2026

Uh oh!

moomindani Jul 2, 2026

Uh oh!

moomindani Jul 2, 2026

Uh oh!

moomindani Jul 2, 2026

Uh oh!

moomindani commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1 +1 @@
		2d6f34bb40832f84cb6c121237b1c5b0a05181dccface9fd171558f4df1747dc
		2ccde55d75b93c7fc2c5723fc7f74bf8995244606190c98acf005ea1f39f04ca

Uh oh!

Conversation

kosiew commented Jun 4, 2026

What does this PR do?

Does this PR introduce any user-facing change?

How was this patch tested?

Was generative AI tooling used to co-author this PR?

Uh oh!

eladkal commented Jul 1, 2026

Uh oh!

moomindani left a comment

Choose a reason for hiding this comment

Uh oh!

moomindani Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

moomindani Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

moomindani Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

moomindani Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

moomindani Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

moomindani Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

moomindani Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

moomindani commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants